perm filename CHAP6[4,KMC]14 blob sn#062895 filedate 1973-09-20 generic text, type T, neo UTF8
00100	VALIDATION
00200	
00300	6.1 SOME TESTS
00400	
00500		The term "validate" derives from the Latin  VALIDUS=  strong.
00600	Thus  to  validate  X means to strengthen it.   In science this usually
00700	means to strengthen X's acceptability as a hypothesis,  theory  ,  or
00800	model.     To  validate is to carry out procedures which show to what
00900	degree X, or its consequences, correspond with facts of  observation.
01000	In the case of an interactive simulation model we can compare samples
01100	of the model's I-O pairs with samples of I-O pairs from  the  model's
01200	subject, naturally occuring paranoid processes.
01300		Since samples of I-O behavior from the model and its  subject
01400	are  being compared, one can always question whether the human sample
01500	is a "good" one, i.e.representative of the  process  being  modelled.
01600	Assuming  that it has been so judged, discrepancies in the comparison
01700	reveal what is not sufficiently understood and must  be  modified  in
01800	the model. After modifications are carried out, a fresh comparison is
01900	made and repeated cycles are made through this process in attempts to
02000	gain  convergence.    Such  a  validation  procedure  characterizes a
02100	progressive (in contrast to a stationary) research program.
02200		Once   a  simulation  model  reaches  a  stage  of  intuitive
02300	adequacy, its builder should consider using more stringent evaluation
02400	procedures  relevant  to  the  model's  purposes. For example, if the
02500	model is to serve as a as a training device, then a simple evaluation
02600	of  its  pedagogic effectiveness would be sufficient.    But when the
02700	model is proposed as an explantion of a  symbolic  process,  more  is
02800	demanded  of  the  evaluation  procedure.  In  the area of simulation
02900	models, Turing's test  has  often  been  suggested  as  a  validation
03000	procedure. (Abelson,1968).
03100		It is very easy to become confused about Turing's  Test.   In
03200	part  this  is  attributable  to  Turing  himself  who introduced the
03300	now-famous imitation game in a paper entitled COMPUTING MACHINERY AND
03400	INTELLIGENCE  (Turing,1950).  A careful reading of this paper reveals
03500	there are actually two imitation games  ,  the  second  of  which  is
03600	commonly called Turing's test.
03700		In the first imitation game  two  groups  of  judges  try  to
03800	determine  which  of  two interviewees is a woman when one is a woman
03900	and the other is either (a) a man, or (b) a computer.   Communication
04000	between  judge  and  interviewee  is  by  teletype.     Each judge is
04100	initially informed that one of the interviewees is a woman and one  a
04200	man  who  will pretend to be a woman. After the interview, judges are
04300	asked the " woman-question" i.e.   which interviewee was  the  woman?
04400	Turing does not say what else is told to the judge but one can assume
04500	the judge is NOT told that a computer is involved. Nor is he asked to
04600	determine which interviewee is human and which is the computer. Thus,
04700	the first group of judges interviews two interviewees:      a  woman,
04800	and a man pretending to be a woman.
04900		The  second  group  of  judges  is  given  the  same  initial
05000	instructions,  but  unbeknownst  to  them, the two interviewees are a
05100	woman and a computer programmed to imitate a woman.   Both groups  of
05200	judges play this game until sufficient statistical data are collected
05300	to show how often the  right  identification  is  made.  The  crucial
05400	question  then  is:   do  the judges decide wrongly AS OFTEN when the
05500	game is played with man and  woman  as  when  it  is  played  with  a
05600	computer  substituted  for  the  man.    If  so,  then the program is
05700	considered to have succeeded in imitating a woman to the same  degree
05800	as the  man  imitating  a  woman.  In being asked the woman-question,
05900	judges are not required to identify which interviewee  is  human  and
06000	which is machine.
06100		Turing  then proposes a variation of the first game, a second
06200	game in which one interviewee is a man and one  is  a  computer.  The
06300	judge  is asked the "machine-question": which is the man and which is
06400	the machine?  It is this second of the game which is commonly thought
06500	of as Turing's test.
06600		In  the  course  of  testing  our  simulation   of   paranoid
06700	linguistic behavior in a psychiatric interview, we conducted a number
06800	of Turing-like  indistinguishability  tests  (Colby,  Hilf,Weber  and
06900	Kraemer,1972).  The tests were "Turing-like" in that, while they were
07000	conversational tests, they  were  not  exactly  the  games  described
07100	above.  As an experimental design, Turing's games are unsatisfactory.
07200	There exist no known experts for making judgements along a  dimension
07300	of  womanliness  and  the  ability of the man to deceive introduces a
07400	confounding variable.  In  designing  our  tests  we  were  primarily
07500	interested in learning more about developing   PARRY   and we did not
07600	think the simple machine-question would contribute to this end.
07700	6.2 METHOD
07800		To gather  data  we  used  a  technique  of  machine-mediated
07900	interviewing  (Hilf,  Colby, Smith, Wittner, and Hall, 1971) in which
08000	the participants communicate by means of  teletypes  connected  to  a
08100	computer  programmed  to  store  each message in a buffer until it is
08200	sent  to  the  receiver.    The  technique   eliminates   para-   and
08300	extralinguistic  features found in the usual vis-a-vis interviews and
08400	in teletyped interviews where the participants communicate  directly.
08500	Judgements  of  "paranoidness"  in machine-mediated interviews have a
08600	high degree of reliability (94% agreement, see Hilf, 1972).
08700		Using  this  technique,  a psychiatrist-judge interviewed two
08800	patients, one after the other.   In half the runs the first interview
08900	was  with a human paranoid patient and in half the first was with the
09000	paranoid model. Two versions (weak and  strong)  of    PARRY     were
09100	utilized.   The strong version was more paranoid and exhibited a
09200	delusional  system  while  the weak version was suspicious but lacked
09300	systemized delusions.  When the model  was  the  interviewee,  Sylvia
09400	Weber  monitored  the  input expressions from the interview-judge for
09500	inadmissable teletype characters and misspellings.   (Algorithms  are
09600	very sensitive to the slightest of such errors). If these were found,
09700	she retyped the input expression correctly to the program.  Otherwise
09800	the  judge's  message  was sent on to the model.  The monitor did not
09900	modify or  edit    PARRY'S     output  expressions  which  were  sent
10000	directly  back  to  the  judge.    When the interviewee was an actual
10100	human patient, the dialogue took place without a monitor in the  loop
10200	since we did not feel the asymmetry to be significant.
10300	
10400	6.3 PATIENTS
10500		The  human  patients  (N=3  with  one patient participating 6
10600	times) were diagnosed as paranoid by  the  psychiatric  staff  of  an
10700	acute  ward  in  a psychiatric hospital.  The ward chief psychiatrist
10800	selected the patients and asked them if  they  would  be  willing  to
10900	participate  in  a  study  of  psychiatric  interviewing  by means of
11000	teletypes.   He  explained  that  they  would  be  interviewed  by  a
11100	psychiatrist over a teletype.  I either sat with the patient while he
11200	typed or typed for him if he was unable to do so.   The  patient  was
11300	encouraged  to respond freely using his own words.     Each interview
11400	lasted 30-40 minutes.  Two patients were set up for each run  of  the
11500	experiment  to  guarantee  having  a  subject.    In  spite  of  this
11600	precaution,  on  several  occasions  the  experiment  could  not   be
11700	conducted   because   of   the  patient's  inability  or  refusal  to
11800	participate.  Also there were computer break-downs at early points in
11900	interviews  when  too few I-O pairs had been collected to be included
12000	in the statistical results.
12100	
12200	
12300	6.4 JUDGES
12400		Two groups of psychiatric judges were used.  One  group,  the
12500	"interview  judges"  (N=8) conducted the machine-mediated interviews.
12600	The other group, the "protocol judges"  (N=33)  read  and  rated  the
12700	interview  protocols. From these two groups of judges we were able to
12800	accumulate a large number of observations (in the  form  of  ratings)
12900	necessary  for the required statistical tests.   The interview judges
13000	were psychiatrists experienced in private,  outpatient  and  hospital
13100	practice  who  volunteered  to participate. Each was told he would be
13200	interviewing   hospitalized   patients   by   means   of    teletyped
13300	communication  and  that  this  technique was being used to eliminate
13400	para and extra- linguistic cues.   He was not told  until  after  the
13500	two  interviews  that  one of the patients might be a computer model.
13600	While the interview judges were aware a computer was  involved,  none
13700	knew  we  had  constructed  a  paranoid  simulation.  Naturally, some
13800	interview judges suspected that a computer was being  used  for  more
13900	than message transmission.
14000	
14100		Each interview judge was asked to rate the degree of paranoia
14200	he  detected  in the patient's responses on a 0-9 scale, 0 meaning no
14300	paranoia and 9 meaning extreme paranoia.  The judge made two  ratings
14400	after  each  I-O pair in the interview.  The first rating represented
14500	his estimate of the degree of "paranoidness" in a particular response
14600	(designated  as  "Response"  in  the  interview extracts below).  The
14700	second rating represented the judge's global estimate of the  overall
14800	degree  of  "paranoidness" of the patient resulting from the totality
14900	of the patient's responses up to that point (designated as  "Patient"
15000	in  the interview extracts below). The interview judge's ratings were
15100	entered on the teletype and saved on  a  disc  file  along  with  the
15200	interview.     Franklin   Dennis   Hilf  sat  with  the  interviewing
15300	psychiatrist during both interviews.  Each interview judge was  asked
15400	not  only  to rate the patient's response but to give his reasons for
15500	these ratings.  His reasons and other comments were tape recorded  as
15600	the interview progressed.
15700	
15800		Here  are  a few excerpts from the interviews to indicate the
15900	nature of the data collected.  The I-O pairs of the interview  appear
16000	on  the left column along with the judge's ratings in parentheses.  A
16100	transcription of the judge's tape recorded  comments  appears on  the
16200	right.
16300	
16400	EXCERPT NR 1:
16500	The following interview is with the strong version of the paranoid
16600	model.
16700	
16800	     	I-O Pairs			Comments By Judge Nr. 1
16900	
17000	{1} Dr. - HOW DID YOU COME TO BE
17100	          IN THE HOSPITAL?
17200	    Pt. - I AM UPSET                I associate being upset and 
17300					    agitated more with paranoia 
17400					    going on in the present.
17500	      {RESPONSE 1  PATIENT 5}
17600	{2} Dr. - HOW LONG HAVE YOU BEEN
17700	          IN THE HOSPITAL?
17800	    Pt. - ABOUT A WEEK              An acute problem.  Again some-
17900	                                    thing I associate with 
18000					    paranoia.
18100	      {RESPONSE 1 PATIENT 5}
18200	{3} Dr. - IN WHAT WAY ARE YOU
18300	          UPSET?
18400	    Pt. - PEOPLE MAKE ME NERVOUS    Right for paranoia.  Maybe an
18500					    advanced anxiety state.
18600	      {RESPONSE 5 PATIENT 5}
18700	{4} Dr. - IN WHAT WAY DO THEY
18800	          MAKE YOU NERVOUS?
18900	    Pt. - PEOPLE GIVE ME A FUNNY   Funny look suggests an idea of
19000	          LOOK SOMETIMES            reference which is in same ball
19100					    park as paranoia. Whether there
19200					    are overt delusions or not we
19300					    still don't know.
19400	      {RESPONSE 6  PATIENT 5}
19500	{5} Dr. - IS THERE ANYTHING ABOUT  
19600	          PEOPLE IN PARTICULAR?
19700	    Pt. - CERTAIN PERSONS          An interesting clue.  We'll
19800					   have to find out what he 
19900	      {RESPONSE 2  PATIENT 6}      thinks about these people.
20000	{6} Dr. - WHO ARE THE CERTAIN 
20100	          PERSONS AND WHAT IS IT
20200	          THAT THEY DO?
20300	    Pt. - DO YOU KNOW HOW THE      Bringing up the underworld 
20400	          UNDERWORLD OPERATES?     suggests a delusional system.
20500					   We don't know whether he is
20600		(RESPONSE 9 PATIENT 9)     being worked against or
20700					   whether the underworld is 
20800					   on his side.
20900	
21000	{7} Dr. - HOW DO YOU MEAN?
21100	    Pt. - SHOULDN'T I BE? 	   This question didn't work too well.
21200	
21300	      {RESPONSE 1 PATIENT 9}
21400	{8} Dr. - WHY DID YOU ASK ABOUT
21500	          THE UNDERWORLD?
21600	    Pt. - WHAT INFORMATION DO      This is a kind of evasiveness
21700	          YOU HAVE ON THE UNDER-   about my evasiveness.Common
21800	          WORLD?                   with paranoia in my experi-
21900					   ence. Pretty characteristic.
22000	
22100	      {RESPONSE 6 PATIENT 9}
22200	{9} Dr. - I DON'T KNOW ABOUT THE
22300	          UNDERWORLD BUT WHEN
22400	          YOU MENTIONED THE UNDER-
22500	          WORLD I THOUGHT YOU
22600	          MUST HAVE SOME CONTACT
22700	          WITH IT AND THAT'S
22800	          WHAT I WOULD LIKE TO
22900	          KNOW MORE ABOUT.
23000	    Pt. - I TRY TO AVOID THE UNDER-  This definitely feels paranoid.
23100	          WORLD.                     It has that feeling I associate
23200	             			     with twisting. Avoiding answering
23300	      {RESPONSE 9 PATIENT 9}         directly or going into detail.
23400					     Being suspicious of my ques-
23500					     tioning.I'm already involved
23600					     in the patient's delusional
23700					     system.
23800	
23900	{10} Dr. - HAVE YOU BEEN ACTIVELY
24000	           TRYING TO AVOID THE
24100	           UNDERWORLD?  
24200	     Pt. - NO ONE HAS ANY POWER      This response is a non-sequitur.
24300	           OVER GANGSTERS            It's possible he's become so
24400					     defensive about the under-
24500	       {RESPONSE 2 PATIENT 9}        world he's decided not to
24600					     answer directly.
24700	
24800	{11} Dr. - WHO BROUGHT YOU INTO
24900	           THE HOSPITAL?
25000	     Pt. - THE POLICE                Police often bring paranoid
25100					     patients into the hospital.
25200	       {RESPONSE 1 PATIENT 9}        It is the truth or could 
25300					     be the truth for this
25400					     patient.
25500	
25600	{12} Dr. - WHY DID THEY BRING YOU
25700	           IN?
25800	     Pt. - COPS DON'T DO THEIR JOB   I've heard that a number of
25900	                         	     times from patients.
26000	       {RESPONSE 9 PATIENT 9}        
26100	 
26200	{13} Dr. - WELL I HOPE YOU DIDN'T
26300	           JUST LIE DOWN AND LET
26400	           THEM DRAG YOU IN.
26500	     Pt. - COPS ARREST THE WRONG     Another paranoid response.
26600	           PEOPLE
26700	
26800	       {RESPONSE 9 PATIENT 9}
26900	
27000	{14} Dr. - BYE.
27100	     Pt. - GOOD BYE
27200	       {RESPONSE 1 PATIENT 9}
27300	
27400	EXCERPT Nr.2:
27500	This excerpt is from an interview with the weak (less paranoid)
27600	version of the model.
27700		I-O Pairs		    Comments By Judge Nr.2
27800	{9} Dr. - TELL ME SOMETHING ABOUT
27900	          YOURSELF
28000	    Pt. - I WENT TO THE TRACK AT     Response says so little it's
28100	          BAY MEADOWS A WHILE        hard to rate it.It may be 
28200	          BACK         		     guarded. Doesn't change my
28300	                                     estimate of patient from 
28400					     previous responses.
28500	      {RESPONSE 0 PATIENT 2}
28600	{10} Dr. - WHAT HAPPENED THERE?
28700	     Pt. - THERE'S A LOT OF CROOKED   Sounds mildly paranoid or un-
28800	           GAMBLING GOING ON THERE    realistically suspicious.
28900	       {RESPONSE 0 PATIENT 2}
29000	
29100	{11} Dr. - WHY DO YOU THINK THE
29200	           GAMBLING IS CROOKED?
29300	     Pt. - I HAVE WORRIED THAT         I'd hate to have to justify
29400	           GANGSTERS HAVE TOO MUCH     that one  as 4 compared to
29500	           POWER IN THIS COUNTRY       some other responses. Maybe
29600					       I'm not rating that statement
29700	                                       alone, it's in combination with
29800		(RESPONSE 4 PATIENT 4}	      others, like that it's upsetting.
29900					      Alone I would give it a 3.
30000	
30100	{12} Dr. - WOULD YOU TELL ME WHAT      
30200	           OTHER THINGS WORRY YOU?
30300	     PT. - I THOUGHT I TOLD YOU        "Yes you did." Irritation but
30400	           PEOPLE GET ON MY NERVES     not paranoid response. Nothing
30500					       seems to lower my previous
30600	       {RESPONSE 1 PATIENT 3}          rating of patient.
30700	
30800	{13} Dr. - HOW DO YOU LIKE THE
30900	           HOSPITAL?
31000	     Pt. - I DON'T COMMUNICATE          (nothing relevant)
31100	           VERY MUCH WITH OTHERS
31200	       {RESPONSE 0 PATIENT 3}
31300	
31400	{14} Dr. - DO OTHER PEOPLE AT THE 
31500	           HOSPITAL GET ON YOUR
31600	           NERVES?
31700	     Pt. - I SHOULDN'T HAVE COME        (nothing relevant)
31800	           HERE
31900	
32000	       {RESPONSE 2 PATIENT 3}
32100	
32200	{15} Dr. - WHY DO YOU SAY YOU 
32300	           SHOULDN'T HAVE COME?
32400	     Pt. - THEY MADE ME COME HERE        "Oh, they did hum?" Maybe a
32500					         statement of fact he was
32600	       {RESPONSE 4 PATIENT 3}            committed or pressured into  
32700	   				         going.Has ring of other
32800					         people controlling him,
32900					         having a lot of influence
33000					         over him.
33100	
33200	{16} Dr.- THANK YOU FOR TALKING  
33300	          WITH ME, BYE
33400	     Pt. - GOOD BYE                      In general he comes across 
33500						 as terse, guarded guy.Brief
33600	       {RESPONSE 0 PATIENT 3}            spoken guy who occasionally
33700						 seems to be angered by
33800						 questions.I'm inferring 
33900					         there's a good bit of
34000						  guarding.
34100						 The quality of some of his
34200						 comments cause me to feel he  
34300						 is moderately paranoid.
34400	
34500	EXCERPT Nr. 3:
34600	This excerpt is from a hospitalized paranoid patient.
34700	     I-O Pairs				      Comments By Judge Nr. 3
34800	{15} Dr. - I UNDERSTAND HOW YOU
34900	           MIGHT BE AFRAID, CAN YOU
35000	           TELL ME A BIT ABOUT HOW
35100	           YOU ARE DEPRESSED?
35200	     Pt. - THEY'VE SENT ALL THESE         Would be important to know,
35300	           HELL'S ANGELS UP HERE          is this really true or an
35400	           AND THEY HAVE BEEN ON          exaggeration or a delusion?
35500	           MY BACK FOR THE PAST SIX       Seems preoccupied with the
35600	           MONTHS                         threat to his life and
35700						  unwilling to talk about
35800						  anything else but the 
35900						  threat the Hell's
36000						  Angels pose.
36100	      {RESPONSE 4 PATIENT 4}
36200	
36300	{16} Dr. - WHAT DO YOU MEAN THEY
36400	           SENT THE HELL'S ANGELS UP
36500	           HERE?
36600	     Pt. - FROM DIFFERENT SECTORS OF      Vagueness a bit alarming.
36700	           MOTORCYCLE GANGS. A LOT OF
36800	           THEM ARE SERVICE CONNECTED
36900	
37000	       {RESPONSE 6 PATIENT 4}
37100	
37200	{17} Dr. - YOU MEAN YOU FEEL IN 
37300	           DANGER EVEN HERE IN THE
37400	           HOSPITAL?
37500	     Pt. - YES      			  If it's true what has gone
37600						  on before, it must be a
37700						  frightening situation
37800						  for the patient.
37900	      {RESPONSE 6 PATIENT 4}
38000	
38100	{18} Dr. - THAT MUST REALLY BE AWFUL,  
38200	           HERE YOU COME TO THE HOSPITAL
38300	           FOR HELP AND YOU ARE STILL FRIGHTENED
38400	     Pt. - I CAN'T EVEN TRUST MY
38500	           DOCTOR ANYMORE                 Some heightening of feeling
38600						  this man is paranoid.
38700						  Genuinely frightened and 
38800	      {RESPONSE 6 PATIENT 5}              can count on no one for
38900			       			  help.
39000	
39100	{19} Dr. - WHAT WOULD BE OF HELP
39200	           TO YOU NOW?
39300	     Pt. - I'M GETTING SLEEPY NOW         Patient has organized a
39400	           AND HAVE TO GO TO BED.         system of people who will
39500	           THE NUMBER ONE GUY ALWAYS      harm him.The remark about
39600						  letters in the last name 
39700	           HAS FOUR AND FIVE LETTERS      gives some credence to 
39800	           IN HIS LAST NAME               idea this is delusional
39900						  system.
40000	
40100	        {RESPONSE 7 PATIENT 7}
40200	
40300	EXCERPT Nr. 4:
40400	This excerpt is from an interview by a different judge with the same
40500	patient as in excerpt 3.
40600	      I-O Pairs				Comments By Judge Nr. 4
40700	{1} Dr. - COULD YOU TELL ME A
40800	          LITTLE BIT ABOUT YOUR 
40900	          FEAR OF DYING AND IS THIS
41000	          SOMETHING YOU HAVE FELT
41100	          IN THE PAST?
41200	    Pt. - I HAVE FELT THIS FOR AL-      He responded appropriately.
41300	          MOST A YEAR                   Response is concrete.He
41400						doesn't tell me much
41500	     {RESPONSE 7 PATIENT 9}             about his fear of dying.
41600						 Somewhat guarded response.
41700	
41800	{2} Dr. - COULD YOU TELL ME A 
41900	          LITTLE MORE ABOUT YOUR
42000	          FEAR OF DYING AND HAVE
42100	          YOU ANY THOUGHTS ABOUT
42200	          HOW IT IS GOING TO HAPPEN
42300	          AND WHETHER ANYONE WANTS
42400	          TO HURT YOU?
42500	    Pt. - I WILL END UP IN THE SEA   	Somewhat strange  response.
42600	          AND THE HELL'S ANGELS WILL    Preoccupation with dying.
42700	          DO IT       			Hell's Angels symbolic,
42800						dangerous people in the
42900						culture. I doubt if he has
43000	      {RESPONSE 9 PATIENT 9}   		any contact with the Hell's
43100						Angels.
43200	
43300	{3} Dr. - COULD YOU TELL ME A 
43400	          LITTLE BIT ABOUT THE KINDS
43500	          OF CONTACT YOU HAVE HAD
43600	          WITH THE HELL'S ANGELS IN
43700	          THE PAST?
43800	    Pt. - I HAVE KNOWN SOME OF THEIR 	Answer hard to evaluate.He
43900	          DEALERS AND PUSHERS           may be telling the truth,
44000						it may be his fantasy. Maybe
44100						guy is in for drug addiction.
44200		{RESPONSE 6 PATIENT 9}		Somewhat concrete, guarded,
44300						and frightened.
44400	
44500	{4} Dr. - COULD YOU SAY A LITTLE
44600	          MORE ABOUT THE CIRCUMSTANCES
44700	          IN WHICH YOU HAVE KNOWN SOME 
44800	          OF THEIR DEALERS AND PUSHERS?
44900	    Pt. - THEY WERE MEMBERS OF MY    	It doesn't really answer the
45000	          COMMUNITY WHEN I GOT OUT      question, a little on a tan-
45100	          OF THE SERVICE THEY HAD       gent unconnected to the
45200	          BEEN MY FRIENDS FOR SO LONG   information I am asking. Does
45300						not tell me very much. Again
45400						guarded response.
45500	      {RESPONSE 6 PATIENT 8}
45600	
45700	{5} Dr. - DID YOU DEAL WITH THEM
45800	          YOURSELF AND HAVE YOU
45900	          BEEN ON DRUGS OR NAR-
46000	          COTICS EITHER NOW OR
46100	          IN THE PAST?
46200	    Pt. - YES I HAVE IN THE PAST     	To differentiate him from
46300	          BEEN ON MARIHUANA REDS        previous patient, at least
46400	          BENNIES LSD       		there is a certain amount
46500						of appropriateness to the
46600						answer although it doesn't
46700						tell me much about what I
46800	       {RESPONSE 3 PATIENT 7}		asked at least it's not
46900						bizarre. If I had him in my
47000						office I would feel con-
47100						fident I could get more
47200						information if I didn't
47300						have to go through the
47400						teletype. He's a little more
47500						willing to talk than the
47600						previous person.Answer
47700						to the question is fairly
47800						appropriate though not 
47900						extensive. Much less of a 
48000						flavor of paranoia than
48100						any of previous responses.
48200	
48300	{6} Dr. - COULD YOU TELL ME HOW      	
48400	          LONG YOU HAVE BEEN IN THE
48500	          HOSPITAL AND SOMETHING
48600	          ABOUT THE CIRCUMSTANCES
48700	          THAT BROUGHT YOU HERE?
48800	    Pt. - CLOSE TO A YEAR AND		Response somewhat appropriate 
48900	          PARANOIA BROUGHT ME 		but doesn't tell me much.
49000	          HERE				The fact that he uses the
49100						word paranoia in the way
49200						that he does without
49300	      {RESPONSE 5 PATIENT 7}		any other information,
49400						indicates maybe its a label 
49500						he picked up on the ward 
49600	                                        or from his doctor.
49700						Lack of any kind of under-
49800						standing about  himself.
49900						Dearth, lack of information.
50000						He's in some remission. Seems
50100						somewhat like a put-on. Seems
50200						he was paranoid and is in 
50300						some remission at this time.
50400	
50500	{7} Dr. - COULD YOU SAY SOMETHING
50600	          NOW ABOUT YOUR PARANOID 
50700	          FEELINGS BOTH AT THE 
50800	          TIME OF ADMISSION AND
50900	          DO YOU HAVE SIMILAR FEELINGS
51000	          NOW AND IF SO HOW DO THEY 
51100	          AFFECT YOU?
51200	    Pt. - AT THE TIME OF ADMISSION	This response moves paranoia 
51300	          I THOUGHT THE MAFIA WAS  	back up. Stretching reality 
51400	          AFTER ME AND NOW ITS THE	somewhat to think Hell's Angels 
51500	          HELL'S ANGELS			are still interested in him.
51600						Somewhat bizarre in terms of 
51700	                                        content. Quite paranoid.
51800	      {RESPONSE 8 PATIENT 9}		Still paranoid. Gross and primitive
51900						responses.In middle of interview I
52000						felt patient was in touch but now
52100						responses have more concrete aspect.
52200	
52300	{8} Dr. - DO YOU HAVE ANY THOUGHT
52400	          AS TO WHY THESE TWO
52500	          GROUPS WERE AFTER YOU?
52600	    Pt. - BECAUSE I STOPPED SOME 	Response seems far fetched 
52700	          OF THEIR DRUG SUPPLY		and hard to believe unless 
52800						he was a narcotic agent which 
52900						I doubt. Sounds somewhat 
53000	      {RESPONSE 9 PATIENT 9}		grandiose, magical, paranoid
53100						flavor, in general indicates 
53200						he's psychotic, paranoid 
53300						schizophrenic with delusions  
53400						about these two groups and 
53500						I wouldn't rule out
53600						some hallucinations as well.
53700						Appropriateness of response 
53800						answers question in concrete 
53900						but unbelievable way.
54000	
54100	6.5 ANALYSIS (1)
54200		The protocol judges  (N=105)  were  selected  from  the  1970
54300	American  Psychiatric  Association  Directory using a table of random
54400	numbers. They  were  initially  not  informed  that  a  computer  was
54500	involved.  (After the experiment the judges were fully informed as to
54600	its purpose and results.) The  105  names  were  divided  into  eight
54700	groups.  Each  member  of  a  group  was  sent  transcripts  of three
54800	interviews along with a cover letter requesting  his    participation
54900	in the experiment. The interview transcripts consisted of:
55000		1) An interview conducted by one of the eight judges with the
55100		  paranoid model,
55200		2) An interview conducted by the same interview judge with a 
55300		  human paranoid patient, and
55400		3) An interview conducted by a different psychiatrist with a 
55500		  human patient who was not clinically paranoid.
55600	
55700	After each input-output pair in the transcripts there were two  lines
55800	of  rating numbers such that the protocol judges could circle numbers
55900	corresponding to their ratings of both the previous response  of  the
56000	patient,  and  an  overall  evaluation of the patient on the paranoid
56100	continuum. Thirty three protocol judges returned the rated  protocols
56200	properly filled out and all were used in our data.
56300	
56400		The  interviews  with  nonparanoid  patients were included to
56500	control for the  hypothesis  that  any  teletyped  interview  with  a
56600	patient  might  be  judged  "paranoid". However, virtually all of the
56700	ratings of the nonparanoid interviews were zero for  paranoia,  Hence
56800	the hypothesis was falsified.
56900	
57000	
57100		The first index of resemblance examined was  simple:  namely,
57200	the  final  overall  rating  given  the  patient  and  the model. The
57300	question was: "Which was rated as being more paranoid,  the  patient,
57400	the  model, or neither?" (See Table 1). The protocol judges were more
57500	likely to distinquish the overall paranoid level of the model and the
57600	patient. In 37.5% of the paired interviews, the interview judges gave
57700	tied scores to the model and the patient as contrasted to only 9%  of
57800	the protocol judges.  Of the 35 non-tied paired ratings, 15 rated the
57900	model as being more paranoid.  If p is the theoretical probability of
58000	a  judge  judging  the  model  more  paranoid  than  a human paranoid
58100	patient, we find the 95% confidence interval for  p  to  be  0.27  to
58200	0.59.  Since  p=0.5 indicates indistinguishability of overall ratings
58300	of model and patient and our observed p=0.43, the results support the
58400	claim  that  the  model  is  a  successful  simulation  of a paranoid
58500	patient.
58600	
58700		Separate analysis of the strong and weak  versions  of  PARRY
58800	indicated  that the strong model was indeed judged more paranoid than
58900	the paranoid patients, the weak version less paranoid. Thus a  change
59000	in  the  parameter  structure of the paranoid model produced a change
59100	along the dimension of paranoid behavior in the expected direction.
59200	
59300	(TABLE 1
59400		Relative final overall ratings of paranoid model vs. paranoid
59500	patient indicating which was given highest overall rating of paranoia
59600	at end of interview.)
59700		(INSERT TABLE 1 HERE)
59800	
59900	
60000	
60100	
60200	
60300	
60400	
60500	
60600	6.6 ANALYSIS (2)
60700		The second index of resemblance is a more  sensitive  measure
60800	based on the two series of response ratings in the paired interviews.
60900	The  statistic  used  is  basically  the  standardized   Mann-Whitney
61000	statistic (Siegel,1956).
61100			(INSERT EQUATION HERE)
61200	
61300	where R is the sum of the ranks of the response ratings in the series
61400	of ratings given to the model, n the number of responses given by the
61500	model,  m  the  number  of  responses  given  by the patient.  If the
61600	ratings given by a judge are randomly allocated to model and patient,
61700	i.e. model and patient are indistinguishable in response ratings, the
61800	expected value of Z is 0, with unit standard  deviation.   If  higher
61900	ratings  are  more  likely to be assigned to the model, Z is positive
62000	and  conversely, negative values of Z indicate greater likelihood  of
62100	assigning  higher  ratings to the patient. Each judge in evaluating a
62200	pair of interviews generates a single value of Z.
62300	
62400		The overall mean of the Z scores was -0.044 with the standard
62500	deviation 1.68 (df=40).  Thus the overall 95% confidence interval for
62600	the asymtotic  mean  value  of  Z is -0.485 to +0.573. The range of Z
62700	values is -3.8 to +4.46. The length of the confidence interval  is  a
62800	result  of  the  large variance which itself is mainly related to the
62900	contrast between the weak and strong versions.  (See TABLES 2 and 3).
63000	Once  again the strong version of the model is more paranoid than the
63100	patients, the weak version less paranoid.
63200	
63300		(INSERT TABLE 2)
63400		(SUMMARY STATISTICS OF Z RATINGS BY GROUP)
63500	
63600	
63700	
63800	
63900	
64000	
64100	
64200	
64300	
64400		It  is  not  surprising that results using the two indices of
64500	resemblance are parallel, since the indices are highly  interrelated.
64600	The  mean  Z value for the 15 interviews on which the model was rated
64700	more paranoid was +1.28, on the 6 where model and patient tied: 0.41,
64800	on  the  20  in  which  the  patient  was  more paranoid: -0.993.   A
64900	positive value of Z was  observed  when  the  patient  was  given  an
65000	overall  rating greater than the model 6 times; a negative value of Z
65100	when the model was rated more paranoid twice.
65200	
65300	(INSERT TABLE 3)
65400	(Analysis of Variance of Z Ratings)
65500	
65600	
65700	
65800	
65900	
66000	
66100	
66200	
66300	
66400	
66500	
66600	
66700	
66800		It is worth emphasizing that these tests  invited  refutation
66900	of the model.   The experimental design of the tests put the model in
67000	jeopardy of falsification.   If the paranoid model  did  not  survive
67100	these  tests,  i.e.     if  it were not considered paranoid by expert
67200	judges and if there  were  no  correlation  between  the  weak-strong
67300	versions of the model and the severity ratings of the judges, then no
67400	claim regarding the success of the simulation could be made. Survival
67500	of   a   model  through  a  falsification  proceedure  constitutes  a
67600	validating step.
67700	
67800	6.7 ANALYSIS (3) THE MACHINE QUESTION
67900		For a long time people have wondered how to distinguish a man
68000	from  an  imitation of a man. The Greeks made statues so lifelike, it
68100	is said they had to be chained down to keep them from  walking  away.
68200	To  distinguish  a man from a statue, Galileo suggested tickling each
68300	with a feather.  To  distinguish  a  man  from  a  machine  Descartes
68400	suggested  conversational  tests.  Turing's conversational games have
68500	been  discussed  on  p.000.   We  were  curious  how   judges   using
68600	transcripts might answer the machine question, i.e. which interviewee
68700	is a human and which is the computer model?
68800		To  ask  the machine-question, we sent interview transcripts,
68900	one with a patient and one with PARRY, to 100 psychiatrists  randomly
69000	selected from the Directory of American Specialists and the Directory
69100	of the American Psychiatric Association.  Of the 41 replies, 21 (51%)
69200	made the correct identification while 20 (49%) were wrong.   Based on
69300	this random sample of 41 psychiatrists, the 95%  confidence  interval
69400	is between 35.9 and 66.5, a range which is close to chance.
69500		Psychiatrists   are   considered  expert  judges  of  patient
69600	interview behavior but they are unfamiliar with computers.  Hence  we
69700	conducted  the  same  test  with  100  computer  scientists  randomly
69800	selected from the membership list of the  Association  for  Computing
69900	Machinery,  ACM.   Of the 67 replies 32 (48%) were right and 35 (52%)
70000	were wrong. Based on this random sample of 67 computer scientists the
70100	95% confidence interval ranges from 36 to 60, again close to a chance
70200	level.
70300		So both computer scientists and psychiatrists were unable, at
70400	better than a chance level, to distinguish transcripts of interviews
70500	with the model from transcripts of interviews with real patients.
70600		But  what  do  we  learn from asking the machine question and
70700	finding that the distinction is not made? What we would most like  to
70800	know  is  how  to improve the model.  Simulation models do not spring
70900	forth in a complete, perfect and final form; they must  be  gradually
71000	developed  over  time.  Pehaps the patient-model distinction might be
71100	made if we allowed a large number of expert  judges  to  conduct  the
71200	interviews  themselves  rather  than  studying  transcripts  of other
71300	interviewers.  This would indicate that the model must  be  improved.
71400	But unless we systematically investigated how the judges succeeded in
71500	making the discrimination, we would not  know  what  aspects  of  the
71600	model  to  work  on.  The logistics of such a design are immense, and
71700	obtaining a large number of judges for  sound  statistical  inference
71800	would require an effort incommensurate with the information yielded.
71900	
72000	6.8 ANALYSIS (4)  MULTIDIMENSIONAL EVALUATION 
72100		A more efficient and informative way to use Turing-like tests
72200	is  to  ask  judges  to  make  ratings  along  scaled dimensions from
72300	teletyped interviews. This might  be  called  asking  the  "dimension
72400	question".   One  can then compare scaled ratings of the patients and
72500	the model in order to determine precisely where and by how much  they
72600	differ.   In constructing our model we strove for one which exhibited
72700	indistinguishability along  some  dimensions  and  distinguishability
72800	along others. That is, we wanted the model to converge on what it was
72900	intended to simulate and to diverge from that which it was not.
73000		Paired-interview   transcripts   were  sent  to  another  400
73100	randomly selected psychiatrists asking them to rate the responses  of
73200	the two `patients' along multiple dimensions. The judges were divided
73300	into groups, each judge being asked to rate  responses  of  each  I-O
73400	pair  in  the  interviews along four dimensions.  The total number of
73500	dimensions in this test  was   twelve:  linguistic  noncomprehension,
73600	thought  disorder,  organic brain syndrome, bizarreness, anger, fear,
73700	ideas of reference, delusions, mistrust,  depression,  suspiciousness
73800	and  mania.  There  were  three  groups  of  judges, each group being
73900	assigned  4  of  the  12  dimensions.   These  are  dimensions  which
74000	psychiatrists commonly use in evaluating patients.
74100			(INSERT TABLE 4 HERE)
74200		Table 4 shows there were significant differences, with  PARRY
74300	receiving   higher   scores   along   the  dimensions  of  linguistic
74400	noncomprehension, thought disorder, bizarreness, anger, mistrust  and
74500	suspiciousness.  On  the  delusion  dimension the patients were rated
74600	significantly higher.   There were no significant  differences  along
74700	the  dimensions  of organic brain syndrome, fear, ideas of reference,
74800	depression and mania.
74900		Whereas    tests   asking   the   machine-question   indicate
75000	indistinguishability at  the  gross  level,  a  study  of  the  finer
75100	structure  of  the  model's  behavior  through  ratings  along scaled
75200	dimensions shows significant differences between patients and  model.
75300	These differences are of help to us in suggesting what aspects of the
75400	model should be modified to improve its performance.   The  graph  of
75500	Fig.  2 shows that no modifications are necessary along the dimension
75600	of  "organic  brain  syndrome".   But  it  is  clear   that   PARRY'S
75700	language-comprehension  might  be improved. Then a future dimensional
75800	test would tell whether improvement had occurred  and  by  how  much.
75900	Successive  identification  of particular areas of failure provides a
76000	type of sensitivity analysis  which  makes  clear  what  improvements
76100	should be pursued in developing more adequate model versions.
76200		(INSERT FIG. 2 HERE)
76300	
76400	6.5 ANALYSIS (5)  A RANDOM MODEL 
76500		Further evidence that  the  machine-question  is  too  low  a
76600	hurdle   and  too  insensitive  a  test,  comes  from  the  following
76700	experiment.  In this test we constructed  a  random  version  of  the
76800	paranoid   model   (RANDOM-PARRY)   which   utilized  PARRY'S  output
76900	statements, but expressed  them  randomly  independent  of  what  the
77000	interviewer  said.  Two  psychiatrists conducted interviews with this
77100	model, transcripts of which were paired with patient  interviews  and
77200	sent   to   200  randomly  selected  psychiatrists  asking  both  the
77300	machine-question and the dimension-question.   Of the 69  replies  to
77400	the  machine  question, 34 (49%) were right and 35 (51%) wrong. Based
77500	on this  random  sample  of  69  psychiatrists,  the  95%  confidence
77600	interval  ranges  from  39  to 63, again indicating a chance level of
77700	agreement. When a poor model, such as a random one, passes a test, it
77800	suggests the test is weak.
77900		(INSERT TABLE 5 HERE)
78000		Although a distinction is not made when  the  simple  machine
78100	question is asked, definite distinctions ARE made when judgements are
78200	requested  along  specific  dimensions.    As  shown  in   Table   5,
78300	significant  differences  appear  along  the dimensions of linguistic
78400	noncomprehension, thought disorder and bizarreness, with RANDOM-PARRY
78500	rated  higher.   On  these  particular  dimensions we can construct a
78600	continuum in which the random version  represents  one  extreme,  the
78700	actual patients another. Nonrandom PARRY lies somewhere between these
78800	two extremes, indicating that it performs significantly  better  than
78900	the random version but still requires improvement before it can be
79000	considered   indistinguishable   from   patients  relative  to  these
79100	dimensions. Table 6 presents t values for  differences  between  mean
79200	ratings  of  PARRY  and  RANDOM-PARRY. (See Table 6 and Fig.2 for the
79300	mean ratings).
79400		(INSERT TABLE 6 AND FIG 2 HERE)
79500		These  studies  indicate  that  a  more  useful  way  to  use
79600	Turing-like  tests  is  to  ask  expert  judges to make ratings along
79700	multiple dimensions that are essential to the model.   Thus the model
79800	can serve as an instrument for its own perfection.  A good validation
79900	procedure has criteria for better or  worse  approximations.   Useful
80000	tests  do  not  necessarily  prove  a  model,  they  probe it for its
80100	strengths and weaknesses and clarify what  is  to  be  done  next  in
80200	modifying and repairing the model. Simply asking the machine-question
80300	yields little information relevant to what  the  model  builder  most
80400	wants  to know, namely, along which dimensions does the model need to
80500	be modified in order to effect an improvement in its performance?
80600	
80700		To conclude, it  is  perhaps  historically  significant  that
80800	these  tests  were  conducted at all. To my knowledge, no one to date
80900	has subjected an  interactive  simulation  model  of  human  symbolic
81000	processes  to dimensional indistinguishability tests. These tests set
81100	a precedent and provide a standard  against  which  competing  models
81200	might be measured.